-
Notifications
You must be signed in to change notification settings - Fork 476
[WIP] MOLT Replicator draft docs #20465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Files changed:
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work @taroface . Not an easy doc to write, but you made it understandable and clean! Let's bottom out on some of these discussions and ensure the deprecation effort from @tuansydau reflects the reality of what we are documenting.
--source 'postgres://migration_user:password@localhost:5432/molt?sslmode=verify-full' | ||
~~~ | ||
|
||
The source connection must point to the PostgreSQL primary instance, not a read replica. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we do have a flag that can still ignore replication setup for cases where folks just want a data load and don't have any need for replication setup or information. Should we clarify this? CC @Jeremyyang920
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanluu12345 I'll just make this statement not appear on the Bulk Load doc. Btw, does the "read replica" guidance only apply to PG replication, or also to MySQL and Oracle replication?
@@ -0,0 +1,27 @@ | |||
### Replicator metrics | |||
|
|||
By default, MOLT Replicator exports [Prometheus](https://prometheus.io/) metrics at the address specified by `--metricsAddr` (default `:30005`) at the path `/_/varz`. For example: `http://localhost:30005/_/varz`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we decide on referring to it as MOLT Replicator
generally from now on? CC @rohan-joshi @Jeremyyang920
@@ -0,0 +1,27 @@ | |||
### Replicator metrics | |||
|
|||
By default, MOLT Replicator exports [Prometheus](https://prometheus.io/) metrics at the address specified by `--metricsAddr` (default `:30005`) at the path `/_/varz`. For example: `http://localhost:30005/_/varz`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, I actually see what Replicator doesn't actually default metricsAddr
which means that metrics are not enabled by default. This is stale information since the MOLT wrapper used to set metricsAddr
to 30005. I think we should call out that the default behavior is to not spin up metrics, but you can set it to a port (:30005
recommended).
Here is the code snippet that made me realize this:
cmd.Flags().StringVar(&metricsAddr, "metricsAddr", "", "start a metrics server")
|
||
{% include molt/molt-setup.md %} | ||
|
||
## Start Fetch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So an important note here is that as part of the deprecation of the wrapper, we're mainly removing the invocations of Replicator from MOLT. However, there is some source database Replication setup that we'll still need to perform for PostgreSQL specifically. The reason we have to do this is because we need to create the slot at the time we actually do the snapshot export so we don't have gaps in data.
So that means that we still need to document the behavior when we set certain pg-*
flags for setting publication, slots and the relevant drop/recreate behavior. I think we'll need to discuss this a bit more in the next team meeting to clearly lay out what the behavior still is. CC @tuansydau @Jeremyyang920
</section> | ||
|
||
<section class="filter-content" markdown="1" data-scope="mysql"> | ||
Use the `replicator mylogical` command. Replicator will automatically use the saved GTID from the staging schema, or fall back to the specified `--defaultGTIDSet` if no saved state exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit: the saved GTID from the staging schema's memo
table if they want to know where to look.
|
||
MOLT Replicator continuously replicates changes from source databases to CockroachDB as part of a [database migration]({% link molt/migration-overview.md %}). It supports live ongoing migrations to CockroachDB from a source database, and enables backfill from CockroachDB to your source database for failback scenarios to preserve a rollback option during a migration window. | ||
|
||
MOLT Replicator consumes change data from CockroachDB changefeeds, PostgreSQL logical replication streams, MySQL GTID-based replication, and Oracle LogMiner. It applies changes to target databases while maintaining configurable consistency {% comment %}and transaction boundaries{% endcomment %}, and features an embedded TypeScript/JavaScript environment for configuration and live data transforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit: MOLT Replicator also consumes
## Prepare the CockroachDB cluster | ||
|
||
{{site.data.alerts.callout_success}} | ||
For details on enabling CockroachDB changefeeds, refer to [Create and Configure Changefeeds]({% link {{ site.current_cloud_version }}/create-and-configure-changefeeds.md %}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to also ensure that the license and organization are set:
SET CLUSTER SETTING cluster.organization = 'organization';
SET CLUSTER SETTING enterprise.license ='$LICENSE';
~~~ | ||
--source 'postgres://crdb_user@localhost:26257/defaultdb?sslmode=verify-full' | ||
~~~ | ||
For failback, MOLT Replicator uses `--targetConn` to specify the original source database and `--stagingConn` for the CockroachDB staging database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this might be confusing now since we don't have to explain it in terms of source or target for the data load portion. I think it may be clearer here to describe the target connection as the destination you want the data to go from from Cockroach sources.
MOLT Fetch replication modes will be deprecated in favor of a separate replication workflow in an upcoming release. This includes the `data-load-and-replication`, `replication-only`, and `failback` modes. | ||
{{site.data.alerts.end}} | ||
|
||
Use `data-load-and-replication` mode to perform a one-time bulk load of source data and start continuous replication in a single command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taroface just want to note that given we remove this specific mode in MOLT, I want to make sure we still document how exactly these modes look like if people want to run them manually. I'll go ahead and describe what replication-only and data-load-and-replication look like. Can you please take an action to move this content into the proper section for this new doc? I trust you with figuring out the appropriate location.
CRDB data load and replication
- Before the data load, get the latest MVCC timestamp so you have the consistent point:
root@localhost:26257/molt> SELECT cluster_logical_timestamp();
->
cluster_logical_timestamp
----------------------------------
1759848027465101000.0000000000
- Create your changefeed so that the
cursor=''
is set to the value from above. Now, the changefeed will send data starting from the above MVCC timestamp
For replication-only, you can just create the changefeed and the changefeed will start sending data from "now". However, if you want to send data from a previous time, you can pass in the proper MVCC timestamp which is of the format shown above
Important note!!! Make sure that the GC TTL is set appropriately so the data from the cursor you're using is still valid: https://www.cockroachlabs.com/docs/stable/protect-changefeed-data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add to the GC detail, which is important to ensure that the changes from back in time where the cursor is, are still valid and are able to be consumed from a changefeed.
Configure GC TTL for a data export or migration
Before starting a data export or migration with MOLT, make sure the GC TTL for the source database is long enough to cover the full duration of the process (for example, the total time it takes for the initial data load).
This ensures that historical data remains available from the changefeed when replication begins.
-- Increase GC TTL to 24 hours (example)
ALTER DATABASE <database_name> CONFIGURE ZONE USING gc.ttlseconds = 86400;
Once the changefeed or replication has started successfully (which automatically protects its own data range), you can safely lower the TTL again if necessary to resume normal garbage collection:
-- Restore GC TTL to 5 minutes
ALTER DATABASE <database_name> CONFIGURE ZONE USING gc.ttlseconds = 300;
Note: that the time in seconds will depend on the user's expected time for the initial data load, and it must be higher than that number.
@@ -1,115 +0,0 @@ | |||
--- | |||
title: Load and Replicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more general note than the one below is that the customer should ensure that they get the proper replication consistent point BEFORE they do a data load so that we can ensure we don't have any data gaps.
I wonder if we can make a call out that if folks want to do the full data load and replication and have consistency, that they first gather consistent point (which we document in the replicator setup sections):
- SCN for Oracle
- LSN for Postgres
- GTID for MySQL
- Cursor for CockroachDB
Second, run the data load until completion.
Third, run replication from the consistent points obtained in the steps above.
CC @tuansydau I think it's fairly crucial we log these out for users so they can at least have information of where they should start from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taroface WRT the replication slot behaviour we were talking about in the meeting:
As discussed, since we are removing the replication modes from MOLT, it is up to the user to make sure that they have the correct consistent point for replication later on when they use Replicator to continue their migrations after data load. Like Ryan said, we will be printing out the SCN/LSN/GTID/Cursor to mark the consistent point to continue replication from after initial data load is initiated.
The final quirk is that for PG migrations to Cockroach targets, we decided to make it so that the postgres replication slots aren't created by default, and are only created if the user passes in the --pglogical-replication-slot-name
and --pglogical-publication-and-slot-drop-and-recreate
flags. This is important to note, because if users don't create the replication slot during data load, they will have to rerun the data load in order to get the slot created + output the consistent replication point.
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS; | ||
|
||
-- Verify supplemental logging | ||
SELECT supplemental_log_data_min, supplemental_log_data_pk FROM v$database; | ||
-- Expected: SUPPLEMENTAL_LOG_DATA_MIN: IMPLICIT (or YES), SUPPLEMENTAL_LOG_DATA_PK: YES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS; | |
-- Verify supplemental logging | |
SELECT supplemental_log_data_min, supplemental_log_data_pk FROM v$database; | |
-- Expected: SUPPLEMENTAL_LOG_DATA_MIN: IMPLICIT (or YES), SUPPLEMENTAL_LOG_DATA_PK: YES | |
-- Enable minimal supplemental logging for primary keys | |
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS; | |
-- Verify supplemental logging status | |
SELECT supplemental_log_data_min, supplemental_log_data_pk FROM v$database; | |
-- Expected: | |
-- SUPPLEMENTAL_LOG_DATA_MIN: IMPLICIT (or YES) | |
-- SUPPLEMENTAL_LOG_DATA_PK: YES |
SELECT MIN(t.START_SCNB) FROM V$TRANSACTION t; | ||
~~~ | ||
|
||
Use the results as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the results as follows: | |
Use the query results by providing the following flag values to `replicator`: |
-- Query the current SCN from Oracle | ||
SELECT CURRENT_SCN FROM V$DATABASE; | ||
|
||
-- Query the starting SCN of the earliest active transaction | ||
SELECT MIN(t.START_SCNB) FROM V$TRANSACTION t; | ||
~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be a correctness issue here, the following should work instead:
-- 1) Capture an SCN before inspecting active transactions
SELECT CURRENT_SCN AS before_active_scn FROM V$DATABASE;
-- 2) Find the earliest active transaction start SCN
SELECT MIN(t.START_SCNB) AS earliest_active_scn FROM V$TRANSACTION t;
-- 3) Capture the snapshot SCN after the checks
SELECT CURRENT_SCN AS snapshot_scn FROM V$DATABASE;
- `--scn`: Use the result from the first query (current SCN) | ||
- `--backfillFromSCN`: Use the result from the second query (earliest active transaction SCN). If the second query returns no results, use the result from the first query instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `--scn`: Use the result from the first query (current SCN) | |
- `--backfillFromSCN`: Use the result from the second query (earliest active transaction SCN). If the second query returns no results, use the result from the first query instead. | |
Compute the flags for replicator as follows: | |
--backfillFromSCN: use the smaller value between `before_active_scn` and `earliest_active_scn`. If `earliest_active_scn` has no value, use `before_active_scn`. | |
--scn: use `snapshot_scn`. | |
Make sure --scn is greater than or equal to --backfillFromSCN. |
@taroface We should also make the user grab their own copy of instant client from Oracle. For the Linux instructions, we should replace: sudo apt-get install -yqq --no-install-recommends libaio1t64
sudo ln -s /usr/lib/x86_64-linux-gnu/libaio.so.1t64 /usr/lib/x86_64-linux-gnu/libaio.so.1
curl -o /tmp/ora-libs.zip https://replicator.cockroachdb.com/third_party/instantclient-basiclite-linux-amd64.zip
unzip -d /tmp /tmp/ora-libs.zip
sudo mv /tmp/instantclient_21_13/* /usr/lib
export LD_LIBRARY_PATH=/usr/lib With: sudo apt-get install -yqq --no-install-recommends libaio1t64
sudo ln -s /usr/lib/x86_64-linux-gnu/libaio.so.1t64 /usr/lib/x86_64-linux-gnu/libaio.so.1
# Download the Oracle Instant Client libraries from Oracle: (https://www.oracle.com/ca-en/database/technologies/instant-client.html) into /tmp/instantclient-basiclite-linux-amd64.zip for example
unzip -d /tmp /tmp/instantclient-basiclite-linux-amd64.zip
sudo mv /tmp/instantclient_21_13/* /usr/lib
export LD_LIBRARY_PATH=/usr/lib Let me know if you have questions on this, it should be updated for each instance of the oracle instant client instructions throughout the docs. |
DOC-13338
DOC-14748
This PR is still WIP.
Notes for reviewers: